Preamble

Things that should affect the matchup:

  • Players (skill)
  • Player order
  • Specs
  • Starter deck
## 
## Divergences:
## 0 of 4000 iterations ended with a divergence.
## 
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 10.
## 
## Energy:
## E-BFMI indicated no pathological behavior.
## 
## Divergences:
## 0 of 4000 iterations ended with a divergence.
## 
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 10.
## 
## Energy:
## E-BFMI indicated no pathological behavior.
## 
## Divergences:
## 0 of 4000 iterations ended with a divergence.
## 
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 10.
## 
## Energy:
## E-BFMI indicated no pathological behavior.
## 
## Divergences:
## 0 of 4000 iterations ended with a divergence.
## 
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 10.
## 
## Energy:
## E-BFMI indicated no pathological behavior.
## 
## Divergences:
## 0 of 4000 iterations ended with a divergence.
## 
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 12.
## 
## Energy:
## E-BFMI indicated no pathological behavior.
## 
## Divergences:
## 0 of 4000 iterations ended with a divergence.
## 
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 12.
## 
## Energy:
## E-BFMI indicated no pathological behavior.
## 
## Divergences:
## 0 of 4000 iterations ended with a divergence.
## 
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 12.
## 
## Energy:
## E-BFMI indicated no pathological behavior.
## 
## Divergences:
## 0 of 4000 iterations ended with a divergence.
## 
## Tree depth:
## 0 of 4000 iterations saturated the maximum tree depth of 12.
## 
## Energy:
## E-BFMI indicated no pathological behavior.
## $`Partial pooling on independent decks`
## NULL
## 
## $`Partial pooling on starters and specs`
## NULL
## 
## $`Partial pooling and interactions on starters and specs`
## NULL
## 
## $`Partial pooling and full interaction between starters and specs`
## NULL
## 
## $`Versus model on starters and specs, forum data only`
## NULL
## 
## $`Versus model on starters and specs with Metal data`
## NULL
## 
## $`Versus model on starters and specs with full Metal data`
## NULL
## 
## $`Versus model on starters and specs with negative player skills, forum data only`
## NULL

Prior choice

See Prior choice.

Models

Mean performance models

Simple deck

In this version, we only consider players, decks as a whole rather than their componenets, and the turn effect on both.

We don’t bother showing the prior samples here, because they’re the same as before.

Posterior predictive checks

It’s hard, for me, to get an idea of how accurate the model is from these results, so let’s look at its predictions for some matches. Looking at the model’s post-hoc predictions for the matches used to fit it is a cheat, since we’re using the data twice, but it should give a rough idea of how good it is.

First, we’ll just list the predicted probability for the observed outcome of each match.

There aren’t many upsets here:

This can be seen more easily in the below density plot.

The post-hoc predictions are heavily lopsided towards unbalanced matchups. Well, these are post-hoc predictions, so we’d expect the predictions to lean towards being correct, to be some extent. What might be more helpful is to compare how often player 1 wins matches, compared to often the model thinks they should.

It looks like the post-hoc predictions currently aren’t lopsided enough! The actual outcomes are even more extreme than predicted.

We also give the average score for each model, for two types of proper scoring rule:

## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.

For reference, putting 50-50 odds on player 1 winning would give a logarithmic score of 0.6931472 and a Brier score of 0.25, regardless of true win rate. The scores have different signs, but in both cases better models have a score closer to zero. So we’re definitely doing better at predicting match results than just flipping a coin, and I don’t just have to hang up my statistician hat in shame. You can also see how much improvement we got just from switching to a “versus model”, where deck components are only evaluated relative to the opposing deck components.

Results for mean models

Turn advantage

Player skill

Deck strength

Deck-only model

And now for deck info (The DeGrey-ding deck is [Present]/Strength/Anarchy):

Deck by spec and starter

And now for deck info:

Inter-deck interactions

And now for deck info:

##     Purple/Finesse     White/Ninjitsu           Red/Fire   Black/Discipline 
##      -0.3103452423      -0.2319162543      -0.2226147568      -0.1691474648 
##    Neutral/Anarchy     Purple/Balance     Neutral/Growth Neutral/Discipline 
##      -0.1610282786      -0.1418095916      -0.1384370357      -0.1165525082 
##       White/Growth     Red/Demonology          Black/Law        White/Truth 
##      -0.1123728707      -0.1085387434      -0.1082192625      -0.1044260592 
##        Black/Truth         White/Fire        Red/Disease         Blue/Truth 
##      -0.1042230843      -0.1025983148      -0.1022270518      -0.0994462186 
##          Red/Truth          Red/Blood        Green/Feral      Neutral/Feral 
##      -0.0963077918      -0.0927326251      -0.0898220735      -0.0893438132 
##        Red/Present       Purple/Feral      Black/Finesse         Blue/Peace 
##      -0.0869380962      -0.0855959276      -0.0847593582      -0.0826820165 
##       Black/Future    Neutral/Disease   White/Necromancy       Blue/Anarchy 
##      -0.0810185804      -0.0755938006      -0.0729893086      -0.0723147749 
##       Blue/Balance          Green/Law   White/Discipline        White/Feral 
##      -0.0703661482      -0.0674767841      -0.0669843768      -0.0637144312 
##     Black/Strength    Neutral/Balance        Red/Finesse        Green/Peace 
##      -0.0629020929      -0.0617714755      -0.0615365473      -0.0586553930 
##      Black/Present Neutral/Demonology       Blue/Disease    Blue/Discipline 
##      -0.0584456351      -0.0565806325      -0.0539330816      -0.0538362301 
##       Black/Growth        Neutral/Law      Green/Anarchy        Black/Peace 
##      -0.0524527073      -0.0516126153      -0.0516050495      -0.0506812883 
##           Red/Past     Purple/Anarchy        Green/Blood    Neutral/Present 
##      -0.0486322123      -0.0470434578      -0.0431687247      -0.0333807017 
##       Green/Future   Neutral/Ninjitsu      Purple/Future      Green/Finesse 
##      -0.0286116176      -0.0276785712      -0.0268192102      -0.0221347698 
##   Green/Demonology       Neutral/Fire     Red/Discipline          Blue/Past 
##      -0.0215432332      -0.0196479186      -0.0186855461      -0.0172415786 
##    Neutral/Finesse      White/Balance        White/Peace         Purple/Law 
##      -0.0154442053      -0.0134172529      -0.0127068804      -0.0081410746 
##          Blue/Fire         Blue/Feral        Black/Feral      Blue/Strength 
##      -0.0061764994      -0.0059306906      -0.0059244511      -0.0048559827 
##       Neutral/Past         Green/Past       White/Future         Red/Future 
##      -0.0047435754      -0.0042487428      -0.0038975838      -0.0036932884 
##       Blue/Bashing         Green/Fire       Red/Ninjitsu          Red/Peace 
##      -0.0033118002      -0.0030087429      -0.0027044074      -0.0027034816 
##      Blue/Ninjitsu     Black/Ninjitsu      White/Bashing   Green/Discipline 
##      -0.0018385937      -0.0013090756      -0.0012413424      -0.0010294659 
##         Black/Past    Blue/Necromancy         Blue/Blood       Blue/Present 
##      -0.0008021038      -0.0007309166      -0.0005411002      -0.0004122199 
##     Purple/Disease    Purple/Ninjitsu        Red/Bashing      Black/Bashing 
##      -0.0001772368       0.0001461503       0.0002211268       0.0003178291 
##  Purple/Discipline       Purple/Truth     Purple/Bashing     Neutral/Future 
##       0.0006328003       0.0007684904       0.0010770861       0.0017715380 
##        Blue/Growth        Red/Balance    Blue/Demonology        Purple/Past 
##       0.0022658816       0.0024995449       0.0039317643       0.0039657235 
##        Blue/Future        Purple/Fire       Green/Growth     Green/Ninjitsu 
##       0.0045078487       0.0058554120       0.0062032863       0.0063856326 
##    Neutral/Bashing            Red/Law      Green/Bashing           Blue/Law 
##       0.0077008357       0.0078900241       0.0084118540       0.0124203907 
##      Purple/Growth          Red/Feral    Purple/Strength   Neutral/Strength 
##       0.0125238524       0.0150078694       0.0271956819       0.0303500917 
##        Black/Blood   White/Demonology     Red/Necromancy      Neutral/Blood 
##       0.0303688652       0.0330827185       0.0356370208       0.0384560799 
##     Purple/Present   Green/Necromancy        Green/Truth         White/Past 
##       0.0420310430       0.0425639696       0.0431436551       0.0440370243 
##      White/Disease      White/Present     Green/Strength      Green/Disease 
##       0.0451647433       0.0544240645       0.0565701468       0.0676941492 
##        Red/Anarchy      Green/Balance      Black/Balance  Purple/Necromancy 
##       0.0688197416       0.0715459729       0.0768072710       0.0867323367 
##      White/Anarchy      Green/Present      Black/Disease       Purple/Peace 
##       0.0886880348       0.0897687253       0.1033477643       0.1056341736 
##     White/Strength Neutral/Necromancy        White/Blood         Red/Growth 
##       0.1084731250       0.1204313819       0.1211758152       0.1293758949 
##       Blue/Finesse       Red/Strength   Black/Demonology  Purple/Demonology 
##       0.1396179638       0.1480477716       0.1510609558       0.1933551883 
##      Neutral/Truth       Purple/Blood   Black/Necromancy      Neutral/Peace 
##       0.1941557007       0.2129897239       0.2280178147       0.2713143742 
##          White/Law         Black/Fire      Black/Anarchy      White/Finesse 
##       0.2740246064       0.3004659831       0.3432983694       0.4100709571

Versus models

In retrospect, a deck’s strength will be heavily dependent on the opposing deck, so ranking deck strengths on a single scale is silly. So we now use models that evaluate deck parts only with respect to an opposing part.

Player skill

## Warning in mapply(FUN = f, ..., SIMPLIFY = FALSE): longer argument not a
## multiple of length of shorter
## Warning: Removed 12144 rows containing non-finite values (stat_ydensity).

This is getting rather crowded! Here’s a version with only players that have been active in 2018 or 2019.

## Warning: Removed 6382 rows containing non-finite values (stat_ydensity).

## Warning: Removed 62 rows containing non-finite values (stat_ydensity).

## Warning: Removed 699 rows containing non-finite values (stat_ydensity).

Split-deck model

Split-deck model with Metalize data

Split-deck model with full Metalize data

This includes all the casual matches recorded by Metalize, in addition to the tournament matches.

Split-deck negative-skill model

Here are the base and negative-skill plots next to each other, for comparison: